Space Efficiencies in Discourse Modeling via Conditional Random Sampling
نویسندگان
چکیده
Recent exploratory efforts in discourse-level language modeling have relied heavily on calculating Pointwise Mutual Information (PMI), which involves significant computation when done over large collections. Prior work has required aggressive pruning or independence assumptions to compute scores on large collections. We show the method of Conditional Random Sampling, thus far an underutilized technique, to be a space-efficient means of representing the sufficient statistics in discourse that underly recent PMI-based work. This is demonstrated in the context of inducing Shankian script-like structures over news articles.
منابع مشابه
Modeling Stock Return Volatility Using Symmetric and Asymmetric Nonlinear State Space Models: Case of Tehran Stock Market
Volatility is a measure of uncertainty that plays a central role in financial theory, risk management, and pricing authority. Turbulence is the conditional variance of changes in asset prices that is not directly observable and is considered a hidden variable that is indirectly calculated using some approximations. To do this, two general approaches are presented in the literature of financial ...
متن کاملStatistical Conditional Sampling for Variable-Resolution Video Compression
In this study, we investigate a variable-resolution approach to video compression based on Conditional Random Field and statistical conditional sampling in order to further improve compression rate while maintaining high-quality video. In the proposed approach, representative key-frames within a video shot are identified and stored at full resolution. The remaining frames within the video shot ...
متن کاملUniCon3D: de novo protein structure prediction using united-residue conformational search via stepwise, probabilistic sampling
MOTIVATION Recent experimental studies have suggested that proteins fold via stepwise assembly of structural units named 'foldons' through the process of sequential stabilization. Alongside, latest developments on computational side based on probabilistic modeling have shown promising direction to perform de novo protein conformational sampling from continuous space. However, existing computati...
متن کاملModeling and Representing Negation in Data-driven Machine Learning-based Sentiment Analysis
We propose a scheme for explicitly modeling and representing negation of word n-grams in an augmented word n-gram feature space. For the purpose of negation scope detection, we compare 2 methods: the simpler regular expression-based NegEx, and the more sophisticated Conditional Random Field-based LingScope. Additionally, we capture negation implicitly via word biand trigrams. We analyze the imp...
متن کاملGlobal Features for Shallow Discourse Parsing
A coherently related group of sentences may be referred to as a discourse. In this paper we address the problem of parsing coherence relations as defined in the Penn Discourse Tree Bank (PDTB). A good model for discourse structure analysis needs to account both for local dependencies at the token-level and for global dependencies and statistics. We present techniques on using inter-sentential o...
متن کامل